18 research outputs found
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
: Random Walk Diffusion meets Hashing for Scalable Graph Embeddings
Learning node representations is a crucial task with a plethora of
interdisciplinary applications. Nevertheless, as the size of the networks
increases, most widely used models face computational challenges to scale to
large networks. While there is a recent effort towards designing algorithms
that solely deal with scalability issues, most of them behave poorly in terms
of accuracy on downstream tasks. In this paper, we aim at studying models that
balance the trade-off between efficiency and accuracy. In particular, we
propose , a scalable embedding model that
computes binary node representations.
exploits random walk diffusion probabilities via stable random projection
hashing, towards efficiently computing embeddings in the Hamming space. Our
extensive experimental evaluation on various graphs has demonstrated that the
proposed model achieves a good balance between accuracy and efficiency compared
to well-known baseline models on two downstream tasks
Learning Graph Representations for Influence Maximization
As the field of machine learning for combinatorial optimization advances,
traditional problems are resurfaced and readdressed through this new
perspective. The overwhelming majority of the literature focuses on small graph
problems, while several real-world problems are devoted to large graphs. Here,
we focus on two such problems: influence estimation, a #P-hard counting
problem, and influence maximization, an NP-hard problem. We develop GLIE, a
Graph Neural Network (GNN) that inherently parameterizes an upper bound of
influence estimation and train it on small simulated graphs. Experiments show
that GLIE provides accurate influence estimation for real graphs up to 10 times
larger than the train set. More importantly, it can be used for influence
maximization on considerably larger graphs, as the predictions ranking is not
affected by the drop of accuracy. We develop a version of CELF optimization
with GLIE instead of simulated influence estimation, surpassing the benchmark
for influence maximization, although with a computational overhead. To balance
the time complexity and quality of influence, we propose two different
approaches. The first is a Q-network that learns to choose seeds sequentially
using GLIE's predictions. The second defines a provably submodular function
based on GLIE's representations to rank nodes fast while building the seed set.
The latter provides the best combination of time efficiency and influence
spread, outperforming SOTA benchmarks.Comment: 2
Time-varying Signals Recovery via Graph Neural Networks
The recovery of time-varying graph signals is a fundamental problem with
numerous applications in sensor networks and forecasting in time series.
Effectively capturing the spatio-temporal information in these signals is
essential for the downstream tasks. Previous studies have used the smoothness
of the temporal differences of such graph signals as an initial assumption.
Nevertheless, this smoothness assumption could result in a degradation of
performance in the corresponding application when the prior does not hold. In
this work, we relax the requirement of this hypothesis by including a learning
module. We propose a Time Graph Neural Network (TimeGNN) for the recovery of
time-varying graph signals. Our algorithm uses an encoder-decoder architecture
with a specialized loss composed of a mean squared error function and a Sobolev
smoothness operator.TimeGNN shows competitive performance against previous
methods in real datasets.Comment: Published in IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP) 2023, Greec
Vulnerability assessment in social networks under cascade-based node departures
In social networks, new users decide to become members, but also current users depart from the network or stop being active in the activities of their community. The departure of a user may affect the engagement of its neighbors in the graph, that successively may also decide to leave, leading to a disengagement epidemic. We propose a model to capture this cascading effect, based on recent studies about the engagement dynamics of social networks. We introduce a new concept of vulnerability assessment under cascades triggered by the departure of nodes based on their engagement level. Our results indicate that social networks are robust under cascades triggered by randomly selected nodes but highly vulnerable in cascades caused by targeted departures of nodes with high engagement level
Maximizing Influence with Graph Neural Networks
International audienceFinding the seed set that maximizes the influence spread over a network is a well-known NP-hard problem. Though a greedy algorithm can provide near-optimal solutions, the subproblem of influence estimation renders the solutions inefficient. In this work, we propose GLIE, a graph neural network that learns how to estimate the influence spread of the independent cascade. GLIE relies on a theoretical upper bound that is tightened through supervised training. Experiments indicate that it provides accurate influence estimation for real graphs up to 10 times larger than the train set. Subsequently, we incorporate it into two influence maximization techniques. We first utilize Cost Effective Lazy Forward optimization substituting Monte Carlo simulations with GLIE, surpassing the benchmarks albeit with a computational overhead. To improve computational efficiency we develop a provably submodular influence spread based on GLIE's representations, to rank nodes while building the seed set adaptively. The proposed algorithms are inductive, meaning they are trained on graphs with less than 300 nodes and up to 5 seeds, and tested on graphs with millions of nodes and up to 200 seeds. The final method exhibits the most promising combination of time efficiency and influence quality, outperforming several baselines
Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly Detection
Given a large social graph, like a scientific collaboration network, what can we say about its robustness? Can we estimate a robustness index for a graph quickly? If the graph evolves over time, how these properties change? In this work, we are trying to answer the above questions studying the expansion properties of large social graphs. First, we present a measure which characterizes the robustness properties of a graph, and serves as global measure of the community structure (or lack thereof). We study how these properties change over time and we show how to spot outliers and anomalies over time. We apply our method on several diverse real networks with millions of nodes. We also show how to compute our measure efficiently by exploiting the special spectral properties of real-world networks
Maximizing Influence with Graph Neural Networks
International audienceFinding the seed set that maximizes the influence spread over a network is a well-known NP-hard problem. Though a greedy algorithm can provide near-optimal solutions, the subproblem of influence estimation renders the solutions inefficient. In this work, we propose GLIE, a graph neural network that learns how to estimate the influence spread of the independent cascade. GLIE relies on a theoretical upper bound that is tightened through supervised training. Experiments indicate that it provides accurate influence estimation for real graphs up to 10 times larger than the train set. Subsequently, we incorporate it into two influence maximization techniques. We first utilize Cost Effective Lazy Forward optimization substituting Monte Carlo simulations with GLIE, surpassing the benchmarks albeit with a computational overhead. To improve computational efficiency we develop a provably submodular influence spread based on GLIE's representations, to rank nodes while building the seed set adaptively. The proposed algorithms are inductive, meaning they are trained on graphs with less than 300 nodes and up to 5 seeds, and tested on graphs with millions of nodes and up to 200 seeds. The final method exhibits the most promising combination of time efficiency and influence quality, outperforming several baselines
On the Trade-off between Over-smoothing and Over-squashing in Deep Graph Neural Networks
International audienceGraph Neural Networks (GNNs) have succeeded in various computer science applications, yet deep GNNs underperform their shallow counterparts despite deep learning's success in other domains. Over-smoothing and over-squashing are key challenges when stacking graph convolutional layers, hindering deep representation learning and information propagation from distant nodes. Our work reveals that over-smoothing and over-squashing are intrinsically related to the spectral gap of the graph Laplacian, resulting in an inevitable trade-off between these two issues, as they cannot be alleviated simultaneously. To achieve a suitable compromise, we propose adding and removing edges as a viable approach. We introduce the Stochastic Jost and Liu Curvature Rewiring (SJLR) algorithm, which is computationally efficient and preserves fundamental properties compared to previous curvature-based methods. Unlike existing approaches, SJLR performs edge addition and removal during GNN training while maintaining the graph unchanged during testing. Comprehensive comparisons demonstrate SJLR's competitive performance in addressing over-smoothing and over-squashing. CCS CONCEPTS • Computing methodologies → Machine learning algorithms; • Computer systems organization → Neural networks